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Abstract 

The following zero-sum game between nature and a statistician blends Bayesian 
methods with frequentist methods such as p-values and confidence intervals. 
Nature chooses a posterior distribution consistent with a set of possible priors. 
At the same time, the statistician selects a parameter distribution for inference 
with the goal of maximizing the minimum KuUback-Leibler information gained 



over a confidence distribution or other benchmark distribution. An application 
to testing a simple null hypothesis leads the statistician to report a posterior 
probability of the hypothesis that is informed by both Bayesian and frequentist 
methodology, each weighted according how well the prior is known. 

As is generally acknowledged, the Bayesian approach is ideal given knowl- 
edge of a prior distribution that can be interpreted in terms of relative fre- 
quencies. On the other hand, frequentist methods such as confidence intervals 
and p-values have the advantage that they perform well without knowledge of 
such a distribution of the parameters. Since neither the Bayesian approach 
nor the frequentist approach is entirely satisfactory in situations involving par- 
tial knowledge of the prior distribution, the proposed procedure reduces to a 
Bayesian method given complete knowledge of the prior, to a frequentist method 
given complete ignorance about the prior, and to a blend between the two meth- 
ods given partial knowledge of the prior. The blended approach resembles the 
Bayesian method rather than the frequentist method to the precise extent that 
the prior is known. 

The problem of testing a point null hypothesis illustrates the proposed frame- 
work. The blended probability that the null hypothesis is true is equal to the p- 
value or a lower bound of an unknown Bayesian posterior probability, whichever 
is greater. Thus, given total ignorance represented by a lower bound of 0, the 
p-value is used instead of any Bayesian posterior probability. At the opposite 
extreme of a known prior, the p-value is ignored. In the intermediate case, 
the possible Bayesian posterior probability that is closest to the p-value is used 
for inference. Thus, both the Bayesian method and the frequentist method 
influence the inferences made. 

Keywords: blended inference; confidence distribution; confidence posterior; hybrid 
inference; maximum entropy; maxmin expected utility; minimum cross entropy; mini- 
mum divergence; minimum information for discrimination; minimum relative entropy; 



observed confidence level; robust Bayesian analysis 

1 Introduction 

1.1 Motivation 

Various compromises between Bayesian and frequentist approaches to statistical infer- 



ence represent first attempts to combining attractive aspects of each approach ( Good 



1983). While the more recent the hybrid inference approach of Yuan (2009) succeeded 
in leveraging Bayesian point estimators with maximum likelihood estimates, reducing 
to the former or the latter in the presence or absence of a reliably estimated prior on 
all parameters, how to extend the theory beyond point estimation is not yet clear. 
Further, hybrid inference in its current form does not cover the case of a parameter 
of interest that has a partially known prior. Since such partial knowledge of a prior 
occurs in many scientific inference situations, it calls for a theoretical framework for 
method development that appropriately blends Bayesian and frequentist methods. 
Ideally, blended inference would meet these criteria: 

1. Complete knowledge of the prior. If the prior is known, the corresponding 
posterior is used for inference. Among statisticians, this principle is almost 
universally acknowledged. However, it is rarely the case of the prior is known 
for all practical purposes. 

2. Negligible knowledge of the prior. If there is no reliable knowledge of a 
prior, inference is based on methods that do not require such knowledge. This 
principle motivates not only the development of confidence intervals and p- 
values but also Bayesian posteriors derived from improper and data-dependent 
priors. Accordingly, blended inference must allow the use of such methods when 
applicable. 



3. Continuum between extremes. Inference relies on tlie prior to the extent 
tliat it is known while relying on the other methods to the extent that it is 
not known. Thus, there is a gradation of methodology between the above two 
extremes. The premise of this paper is that this intermediate scenario calls 
for a careful balance between pure Bayesian methods on one hand and impure 
Bayesian or non-Bayesian methods on the other hand. 

Instead of framing the knowledge of a prior in terms of confidence intervals, as in 
pure empirical Bayes approaches, it will be framed more generally herein in terms of 



a set of plausible priors, as in interval probability (Weichselberger, 2000; Augustin 



2002, 2004) and robust Bayesian (Berger, 1984) approaches. Whereas the concept 



of an unknown prior cannot arise in strict Bayesian statistics, it does arise in robust 
Bayesian statistics when the levels of belief of an intelligent agent have not been fully 



assessed (Berger, 1984). Unknown priors also occur in many more objective contexts 



involving purely frequentist interpretations of probability in terms of variability in the 
observable world rather than the uncertainty in the mind of an agent. For example, 
frequency-based priors are routinely estimated under random effects and empirical 
Bayes models; see, e.g., Efron ( 2010[ ). (Remark [I] comments further on interpretations 
of probability and relaxes the convenient assumption of a true prior.) 

With respect to the problem at hand, the most relevant robust Bayesian ap- 
proaches are the minimax Bayes risk ('T-minimax") practice of minimizing the max- 



imum Bayes risk (Robbins, 1951, Berger, 1985, Vidakovic, 2000) and the maxmin 



expected utility ("conditional F-minimax") practice of maximizing the minimum pos- 
terior expected payoff or, equivalently, minimizing the maximum posterior expected 



loss (Gilboa and Schmeidler, 1989, DasGupta and Studden, 1989, Vidakovic, 2000 



Augustin, 2002, 2004). Augustin (2004) reviews both methods in terms of inter- 



val probabilities that need not be subjective. With typical loss functions, the for- 
mer method meets the above criteria for classical minimax alternatives to Bayesian 



methods but does not apply to other attractive alternatives. For example, several 
confidence intervals, p-values, and objective-Bayes posteriors routinely used in bio- 



statistics are not minimax optimal. (Fraser and Reid (1990) and Fraser (2004) argued 



that requiring the optimality of frequentist procedures can lead to trade-offs between 
hypothetical samples that potentially mislead scientists or yield pathological proce- 
dures.) Optimality in the classical sense is not required of the alternative procedures 
under the framework outlined below, which can be understood in terms of maxniin 
expected utility with a payoff function that incorporates the alternative procedures 
to be used as a benchmark for the Bayesian posteriors. 



1.2 Heuristic overview 



To define a general theory of blended inference that meets a formal statement of the 



three criteria. Section 2 introduces a variation of a zero-sum game of Tops0e (1979) 



Harremoes and Tops0e| (|2001), and Tops0e (2007). (The discrete version of the game 



also appeared in Pfaffelhuber (1977), and Griinwald and Philip Dawid (2004) inter- 



preted it as a special case of the maxmin expected utility problem.) The "nature" 
opponent selects a prior consistent with the available knowledge as the "statistician" 
player selects a posterior distribution with the aim of maximizing the minimum infor- 
mation gained relative to one or more alternative methods. Such benchmark methods 
may be confidence interval procedures, frequentist hypothesis tests, or other tech- 
niques that are not necessarily Bayesian. 

From that theory. Section [3] derives a widely applicable framework for testing 
hypotheses. For concreteness, the motivating results are heuristically summarized 
here. Consider the problem of testing Hq : 6^, = 0, the hypothesis that a real-valued 
parameter 6^, of interest is equal to the point on the real line M. The observed data 
vector X is modeled as a realization of a random variable denoted by X. Let p (x) 
denote the p- value resulting from a statistical test. 



It has long been recognized that the p-value for a simple (point) null hypothesis is 



often smaller than Bayesian posterior probabilities of the hypothesis (Lindley 1957 



Berger and Sellke, 1987). Suppose ^* has an unknown prior distribution according to 



which the prior probability of Hq is ttq. While ttq is unknown, it is assumed to be no 
less than some known lower bound denoted by ttq. 



Following the methodology of Berger et al. (1994), Sellke et al. (2001) found a 



generally applicable lower bound on the Bayes factor. As Section 3J^ will explain, 
that bound immediately leads to 



FT{Ho\piX)=p{x)) 



1 - 



TTr 



TToCp (x) logp {x) 



:i] 



as a lower bound on the unknown posterior probability of the null hypothesis for 
p (x) < 1/e and to tTq as a lower bound on the probability if p (x) > 1/e. 

In addition to Pr {Hq\p {X) = p{x)) , the unknown Bayesian posterior probability 
of Ho, there is a frequentist posterior probability of Hq that will guide selection of a 
posterior probability for inference based on ttq > tTq and other constraints summarized 
by Pt{Ho\p{X) =p{x)) > P]i{HQ\p{X) =p{x)). While it is incorrect to interpret 



the p-value p (x) as a Bayesian probability, it will be seen in Section 3.2 that p (x) is 
a confidence posterior probability that Hq is true. 

With the confidence posterior as the benchmark, the solution to the optimization 
problem described above gives the blended posterior probability that the null hypoth- 
esis is true. It is simply the maximum of the p-value and the lower bound on the 
Bayesian posterior probability: 



Ft{Ho;p{x)) =p{x) VPr(iJo|p(^) =p{x)). 



(2) 



By plotting Pr {Hq; p (x)) as a function of p (x) and tTq, Figures [I] and [2] illustrate each 
of the above criteria for blended inference: 
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1. Complete knowledge of the prior. In this example, the prior is only known 



when tTq = 1, in which case 



FTiHo;pix)) = -PTiHo\piX)=p{x)) = 1 

for all p {x). Thus, the p- value is ignored in the presence of a known prior. 

2. Negligible knowledge of the prior. There is no knowledge of the prior when 
TTq = and negligible knowledge when tTq is so low that Pr {Ho\p (X) = p (x)) < 
p{x). In such cases, Pt{Hq;p{x)) = p{x), and the Bayesian posteriors are 
ignored. 

3. Continuum between extremes. When ttq is of intermediate value in the 
sense that Pr {Ho\p (X) = p (x)) is exclusively between p (x) and 1, 

FT{Ho;p{x)) = -PiiHo\p{X)=p{x)) < 1. 

Consequently, Pt{Ho]p{x)) increases gradually from p (a;) to 1 as tTq increases 
(Figures [l| and |2|. In this case, the blended posterior lies in the set of allowed 
Bayesian posteriors but is on the boundary of that set that is the closest to 
the p-value. Thus, both the p-value and the Bayesian posteriors influence the 
blended posterior and thus the inferences made on its basis. 



The plotted parameter distribution will be presented in Section 3.3 as a widely ap- 
plicable blended posterior. 

Finally, Section |4] offers additional details and generalizations in a series of re- 
marks. 
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Figure 1: Blended posterior probability that the null hypothesis is true versus the 
p-value. The curves correspond to lower bounds of prior probabilities ranging in 5% 
increments from 0% on the bottom to 100% on the top. 
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Figure 2: Blended posterior probability that the null hypothesis is true versus the 
p-value and the lower bound of the prior probability that the null hypothesis is true. 
The top plot displays the full domain, half of which is shown in the bottom plot. 



2 General theory 

2.1 Preliminary notation and definitions 

Denote the observed data set, typically a vector or matrix of observations, by x, a 
member of a set X that is endowed with a a-algebra X. The value of x determines 
two sets of posterior distributions that can be blended for inference about the value 
of a target parameter. Much of the following notation is needed to transform general 
Bayesian posteriors and confidence posteriors or other benchmark posteriors such 
that they are defined on the same measurable space, that of the target parameter. 



(A confidence posterior, to be defined in Section 3.2.1, is a parameter distribution 
from which confidence intervals and p-values may be extracted. As such, it facilitates 
blending typical frequentist procedures with Bayesian procedures.) 

2.1.1 Bayesian posteriors 

With some measurable space ( O^,,^* j for parameter values in 6,,, let PP"™ denote 
a set of probability distributions on ( A* x O^,, X (g) ^* ) • Any distribution in V^^^"^ is 
called a prior (distribution), understood in the broad sense of a model that includes 
the possible likelihood functions as well as the parameter distribution. It encodes the 
constraints and other information available about the parameter before observing x. 
On the other hand, any distribution of a parameter is called a posterior (distribu- 
tion) if it depends on x. For some pp"°'' g 'Pf"°'^, an example of a posterior distribu- 
tion on ( B^,, ^,, j is P^: = pp'''°'' (^IX = x), where X is a random variable of a distri- 
bution on {X, X) that is determined by pp"°''. p^ is called a Bayesian posterior (dis- 
tribution) since it is equal to a conditional distribution of the parameter given X = x. 



Adapting an apt term from |Tops0e| ( |2OO7| , the set V^ = {PP''°' (.jX = x) : Pf'°' G pP™=-} 
of Bayesian posteriors on ( O*,^* j may be considered the "knowledge base." For a 
set B, if f : 0* — > is an ^,,-measurable map and if 6** has distribution P* E V^, 
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then 9 = f ioA, referred to as an inferential target of A, has induced probabihty 
space ( G, ^, -P ) . The set 

of all distributions thereby induced and the set V of all probability distributions on 
(6, A) are related hjV CV. 



Example 1. In the hypothesis test of Section 1.2, ^ = if the null hypothesis that 
^* = is true and 6' = 1 if the alternative hypothesis that ^=1, 7^ is true, where 
6^: and 6 are random variables with distributions respectively defined on the Borel 
space (M, i3(]R)) and the discrete space (^{0,1} ,2^^'^^^, where 2^'^'^^ is the power set 
of {0, 1}. Thus, in this case, f is the indicator function l(-oo,o)u(o,oo) : M — t- {0, 1}, 
yielding 9 = l(-oo,o)u(o,oo) (^*)- Section 3 considers this example in more detail. 

A function that transforms a set of parameter distributions to a single parameter 



distribution on the same measurable space is called an inference process (Paris, 1994 



Paris and Vencovska, 1997). The resulting distribution is known as a "representation" 



(Augustin, 2002) or "reduction" (Bickel, 2011a) of the set. Perhaps the best known 



inference process for a discrete parameter set is that of the maximum entropy 
principle, which would select a member of V such that it has higher entropy than 
any other member of the set (see Remark |2]). This paper will propose a wide class of 
inference processes such that each transforms "P to a member of V on the basis the 
following concept of a benchmark distribution on (0,^). 

2.1.2 Benchmark posteriors 

For the convenience of the reader, the same Latin and Greek letters will be used for the 
set of posteriors that will represent a gold standard or benchmark method of inference 



as for the Bayesian posteriors of Section |2.1.1[ with the double-dot • replacing the 
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single-dot •. Let V^, represent a set of posterior distributions on some measurable 
space [Q^^AA^ and let *P* represent a set of such sets. For instance, considering any 
P=K in "P^K, Pk may be a confidence posterior (a fiducial-like distribution to be defined 



precisely in Section 3.2), a generalized fiducial posterior of Hannig (2009), or even a 
Bayesian posterior based on an improper prior. (In the first case, nested confidence 
intervals with inexact coverage rates generate a set "P* of multiple confidence posteriors 
rather than the single confidence posterior that is generated by exact confidence 



intervals (Bickel, 2011a).) Suppose there exists a function r : *P=k — )■ such that P, 
the probability distribution of r ( P* ) , is defined on (O, A). P is called the benchmark 
posterior (distribution), and 9 = t (V*) is the inferential target of P*. It follows that 
P is in P but not necessarily in P. 

Example 2. Consider a model in which the full parameter 0* G 0* consists of an 
interest parameter 6 and a nuisance parameter A. The measurable space of 0^, = 
(^, A) is denoted by (G*,^*j, and that of 9 by (O,^). Suppose that a set of 
Bayesian posteriors is available for 0* but that nested confidence intervals are only 
available for an unknown parameter 6 E Q. It follows that a confidence posterior 
P is available on {Q,A) but not on (0=k,^*j. Then the framework of this section 
can be applied by using the function f such that = f ( ^* J in order to project the 
Bayesian posteriors onto (O,^), the measurable space on which P is defined. In this 
case, since there is only one possible benchmark posterior, the function r need not 
be explicitly constructed. 

The function r allows consideration of a set of possible benchmark posteriors by 
transforming it to a single benchmark posterior defined on (G,^), the same measur- 
able space as the above Bayesian posteriors of 6. Since that function is unusual, two 
ways to compose it will now be explained. 

Example 3. Consider the inference process 11 : *P^, — )■ P^,, where P* is the set of 
all probability distributions on (0^,,^^,]. Define the random variable ^^, to have 
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distribution 11 ( "P,,, j (•) = 11 ( "P,,, j . If f : 0=,, — ;■ is an ^^.-measurable function, then 
= f (^* j is the inferential target of P^,. Further, the distribution P of ^ is the 
benchmark posterior. 



Example 4. Whereas Example [3] applied an inference process before a parameter 
transformation, this example reverses the order by first applying f. Let P denote the 
subset of P consisting of all distributions of the parameters transformed by f : 



P = |p : r (^;) ~ p, e, ^p,en} 



Then an inference process transforms P to the benchmark posterior P, which in turn 
is the distribution of 6, the inferential target of P^,. 



2.2 Blended inference 

In terms of Radon-Nikodym differentiation, the information divergence of P with 
respect to Q on (B, A) is 

/(P||0) = /.P.og(g) (3) 

ii P <^ Q and I {P\\Q) = oo otherwise. I {P\\Q) is also known as cross/relative en- 
tropy, /-divergence, information for discrimination, and KuUback-Leibler divergence. 
Other measures of information may also be used (Remark [s]). For any posteriors 
P E V and Q E V, the inferential gain I (P\\P -^ Q) of Q relative to P given P is 
the amount of information gained by making inferences on the basis of Q instead of 
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the benchmark posterior P: 

/(p||p^g)=/(p||p)-/(p||g). 

Let V (P) denote the largest subset of V such that the information divergence of 
any of its members with respect to P is finite. That is, 

v(p) = \PeV: I (p\\p) < 00} , (4) 



which is nonempty by assumption. (The assumption is not necessary under the 
generahzation described in Remark 111) 

The blended posterior (distribution) P is the probabihty distribution on {Q,A) 
that maximizes the inferential gain relative to the benchmark posterior given the 
worst-case posterior restricted by the constraints that defined V and V ( P 



inf I (P\\P^ P) = sup inf I (p\\P -^ q) , (5) 

Pev(p) ^ ' QeV P£v(^p) ^ ' 

where the supremum and infinum over any set including an indeterminate number 



are 00 and —00, respectively (Tops0e, 2007). Inferences based on P are blended in 



the sense that they depend on both V and P in the ways to be specified in Section 

E3 



The main result of Theorem 2 of Tops0e (2007) gives a simply stated solution of 



the optimality problem of equation (|5| under broad conditions. 

Proposition 1. /// ( P||Pj < 00 jor some P eV and ifV (P) is convex, then the 
blended posterior P is the probability distribution in V that minimizes the information 
divergence with respect to the benchmark posterior: 



I{P\\P) = inf /(P||P) . (6) 

Pev(p) 
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Proof. Tops0e (2007) proved the result from inequalities of information theory given 



the additional stated condition of his Theorem 2 that I {P\\P] < cxo for all P G 



V iPj. (See Remark 4 ) The condition that I lP\\P] < oo for some P gV and the 
above definition oiV \ P] ensure that the condition is met. D 



Alternatively, the minimization of information divergence may define P rather 
than result from its definition in terms of the game (Remark [s]). 

2.3 Properties of blended inference 

The desiderata of Section [T] for blended inference can now be formalized. A posterior 
distribution P (•]'P, P) on {Q,A) is said to blend the set V of Bayesian posteriors 
with the benchmark posterior P for inference about the parameter in B provided that 
P (•;V, P] satisfies the following criteria under the conditions of Proposition 1 

1. Complete knowledge of the prior. If V has a single member P, then 
P(»;V,P) =P. 

2. Negligible knowledge of the prior. \i P E V and if V has at least two 
members, then P \ •:'P, P] = P. 



3. Continuum between extremes. For any D > and any V* ^V such that 

<D (7) 



sup 



and such that V \ P] UV* is convex 



I P P] -I P P 



I P •:VUV\P]\\P] -I P •:V,P]\\P 



< D. 



Theorem 1. The blended posterior P blends the set V of Bayesian posteriors with 
the benchmark posterior P for inference about the parameter in 0. 
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Proof. Since the criteria are only required under the conditions of Proposition [TJ it will 
suffice to prove that the criteria follow from equation (|6|. If "P has a single member 
P, then equation (pi) implies that P = P, thereby ensuring Criterion hi Similarly, 
ii P E V, then equation ^ implies that P = P, thus proving that Criterion ^ is 
met. Assume, contrary to Criterion |3| that there exist a D > and a P* C "P 
such that V (P) UP* is convex, equation ([T]) is true, and equation ([s]) is false with 
P ( •; P U P*, P ] and P ( •; P, P j equal to the blended posteriors respectively using 
P U P* and P as the sets of Bayesian posteriors. Then equation ^ can be written as 



/(P(•;PUP^P) IIP) = inf /(P||P,, 



l(p(»;V,p)\\p) = inf /fp||P 



PeV[P) 
Hence, with a Ab signifying the minimum of a and b, 



/ P •;PUP*,P IIP -/ P •;P,P IIP 



inf / P||P - inf / P||P A inf / P||P 
pgvIp) V / Pevlp) ^ ^ P^^* ^ 

which cannot exceed mip^p/p\ / ( P||P j — infpgp* / ( P||Pj and thus, according to 
equation ([T]), cannot exceed D. Therefore, the above assumption that equation (|8| is 
false is contradicted, thereby establishing satisfaction of Criterion 3. D 

Example 5. Suppose the set of possible priors consists of a single frequency-matching 
prior, i.e., a prior that leads to 95% posterior credible intervals that are equal to 95% 
confidence intervals, etc. If the benchmark posterior is the confidence posterior that 
yields the same confidence intervals, then it is the Bayesian posterior distribution 
corresponding to the prior. In that case, the blended distribution is equal to that 
Bayesian/confidence posterior. Thus, the ffist condition of blended inference applies. 
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The second condition would instead apply if the set of possible priors contained at 
least one other prior in addition to the frequency-matching prior. 

Criterion [3] is much stronger than the heuristic idea of continuity introduced in Sec- 
tion |1.1[ Its use of information divergence can be generalized to other measures of 
divergence (Remark Isl). 



3 Blended hypothesis testing 

A fertile field of application for the theory of Section [2] is that of testing hypotheses, 
as outlined in Section L2 Building on Example [TJ this section provides methodology 



for a wide class of models used in hypothesis testing. 



3.1 A bound on the Bayesian posterior 



Defining that class in terms of the concepts of Section 2.1.1 requires additional no- 
tation. For a continuous sample space X and a function p : X —^ [0,1] such that 
p {X) ~ U (0, 1) under a null hypothesis, each p (x) for any x E X will be called a p- 
value. Using some dominating measure, let /o and /i denote probability density func- 
tions of p (X) under the null hypothesis ( ^ = ) and under the alternative hypothesis 
\Q = 1 ) , respectively. For the observed x, the likelihood ratio /o {p (x)) //i {p (x)) is 
called the Bayes factor since, for a prior distribution P^"""^, Bayes's theorem gives 



WpW) _Pr°'(''-VMpW) ,g) 



l-ip{p (X)) pprior /^ ^ A /i (p (x)) ' 

where cp {p (x)) = pp"°'' io = {}\p (X) = p{x)\. Here, as v? {p (x)) is a local false dis 



covery rate (LFDR), the letter (p abbreviates "false" (Efron, 2010, Bickel, 2011c). 
In a parametric setting, /i (p(x)) would be the likelihood integrated over the prior 
distribution conditional on the alternative hypothesis. 
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Let K : Af — ^ M denote the function defined by the transformation /t (x) = 
— \ogp{x) for all X E X. Then a probability density of k,{X) under the null hy- 
pothesis is the standard exponential density qq^k^x)) = e~'^^^'. Assume that, un- 
der the alternative hypothesis ( ^ = 1 j , k (X) admits a density function gi with re- 
spect to the same dominating measure as qq. It follows that go {k, {x)) / gi {k, {x)) = 
fo{pi^)) / fi{p{^))- The hazard rate hi{K,{x)) under the alternative is defined by 
hi (k (x)) = gi {k, (x)) / J,, gi (k) dk for all x G A", and hi : (0, oo) — )■ [0, oo) is called 
the hazard rate function. 



Sellke et al. (2001 ) obtained the following lower bound b {p (x)) of the Bayes factor 



6(x). 

Lemma 1. If hi is nonincreasing, then, for all x E X , 



f C„ (^\\ -ep (x) logp (x) ifpix) < 1/e; 

b{p{x)) = ^PPp>l>b{p{x))={ (10) 

1 if P[x) > 1/e. 



The condition on the hazard rate defines a wide class of models that is useful for 
testing simple null hypotheses. A broad subclass will now be defined by imposing 
constraints on tcq = pp"°'' (^ = 01, the prior probability that the null hypothesis 
is true, in addition to the hazard rate condition. Specifically, ttq is known to have 
ttq G [0, 1] as a lower bound. Thus, rearranging equation ^ as 

cpipix))=(l+' ^"^° 



nobo{p{x)) 
a lower bound on if {p (x)) is 

PT{Ho\p{X)=p{x))=^{p{x)) = (^1 + (^;^^^ 
leading to equation ([I]). 
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Let V consist of all probability distributions on {Q,A) = ({0, 1} , 2^°'^^'). The 
subset V consists of all P G 7^ such that P iO = O) > (p{p{x)). 

3.2 A confidence benchmark posterior 
3.2.1 Confidence posterior theory 



to 



The following parametric framework facilitates the application of Section |2.1.2 
hypothesis testing. The observation x is an outcome of the random variable X of 
probability space {X, X, -P6»,,a*)) where the interest parameter 6^, G 0* and a nuisance 
parameter A* (in some set A.,,) are unknown. Let 5 : 0* x A" — )■ [0, 1] and t : A' x ©=,, — )■ 
M denote functions such that 5* (•; x) is a distribution function, S (6'*; X) ~ U (0, 1), 
and 

S (6,; x) = Pe„x, (t (X; 0,) > t {x; ^.)) 

for all X E X, 6^ E Q^., and A* G A*. S* is known as a significance function, and t as 
a pivot or test statistic. It follows that p (x) = S (0; x) is a p- value for testing the 
hypothesis that ^* = and that [S^^ {a; X) , S~^ (/3; X)] is a (/3 — a) 100% confidence 
interval for 6^, given any a G [0,1] and /3 G [«, 1]. Thus, whether a significance 
function is found from p-values over a set of simple null hypotheses or instead from a 
set of nested confidence intervals, it contains the information needed to derive either 
dSchweder and Hjortj [2002| [Singh et"aL| [20071 iBickelf |2011a||b| . 



Let 6^: denote the random variable of the probability measure P,, that has S (•; x) as 
its distribution function. In other words, P* ( ^* < ^* j = 5 (0=k; x) for all ^* G 0*. P* 
is called a confidence posterior (distribution) since it equates the frequentist coverage 
rate of a confidence interval with the probability that the parameter lies in the fixed. 
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observed confidence interval: 



/3-« = P9„x,{e.e[S-'{a-X),S-'{f3-X)]) 
= Pje, e [S-'{a;x),S-'{^-x)] 



for all X G X, 6^ G 0*, and A* G A^,. The term "confidence posterior" (Bickel 



2011a|[b) is preferred here over the usual term "confidence distribution" (Schweder and 



Hjort 2002) to emphasize its use as an alternative to Bayesian posterior distributions. 



Polansky (2007), Singh et al. (2007), and Bickel (2011a) provide generalizations to 



vector parameters of interest. Extensions based on multiple comparison procedures 
are sketched in Remark [6l 

3.2.2 A confidence posterior for testing 

For the application to two-sided testing of a simple null hypothesis, let 6^, = 1^**1, 
the absolute value of a real parameter 6'^,^, of interest, leading to 0^, = [0, oo). Then 
p{x) = S{0]x) is equivalent to a two-tailed p-value for testing the hypothesis that 
e„ = 0. Since P* (o^ < o) = S (0; x) and since P, (e^ < o) = P* (o^ = oV it follows 
that p{x) = P* (^* = 0], i.e., the p-value is equal to the probability that the mull 
hypothesis is true. 

If Pk is the only confidence posterior under consideration, then P* = ^ P* k and 
there is no need for an inference process. Following the terminology of Example [3} 
r : 0* — 7- is defined by f ( ^* j = l(o,oo) [d*]- By implication, ^ = if ^* = and 
6' = 1 if ^* > 0. Thus, p (x) = Pk ( ^* = j ensures that P ( ^ = j = p (x), which in 
turn implies P(^ = l) = I — p (x). 



Example 6. In the various t-tests, 0* is the mean of X or a difference in means, and 
the statistic t {X; 0) is the absolute value of a statistic with a Student t distribution 
of known degrees of freedom. The above formalism then gives the usual two-sided 
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p- value from a t-test as P iO = O) and p (x) . Specials cases of this P have been 



presented as fiducial distributions (van Berkuna et al. (1996):Bickel, 2011d) 



3.3 A blended posterior for testing 



This subsection blends the above set V of Bayesian posteriors with the above confi- 



dence posterior P as prescribed by Section 2.2 Gathering the results of Sections 

V = \peV:P(e = 0)>ip{p (x)) 



3.1 



and 3.2 



p{e = o\ =p(x) = i-p(e = i 



Equation Q then implies that 



V(p) = \PEV:ip{pix)) <P(9 = 0)<1 



in which the first inequality is strict if and only if (f{p{x)) = and the second 



inequality is strict unless p{x) = 1. Since V (P) is convex, Proposition 



1 yields 






P{9 = 0) 



(p{p{x)) a p (x) < (f {p (x)) 

p{x) ii p {x) > ip {p {x)) 



:ii^ 



where 9 is the random variable of distribution P. With the identities (p{p{x)) = 
Pr(ifob(^) =p{x)) and P {9 = 0) = Pt{Hq;p{x)) and with the establishment of 
equation (fTl) by Section 3.1, equation (11) verifies the claim of equation ^ made in 
Section 11.21 
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4 Remarks 

Remark 1. As mentioned in Section |1.1[ the use of Bayes's theorem with proper 
priors need not involve subjective interpretations of probabihty. The set of posteriors 
may be determined by interval constraints on the corresponding priors without any 



requirement that they model levels of belief ( Weichselberger , 2000, Augustin, 2002 



2004). However, subjective applications of blended inference are also possible. While 



the framework was developed with an unknown prior in mind, the concept of imprecise 



or indeterminate probability (Walley, 1991) could take the place of the set in which 



an unknown prior lies. By allowing the partial order of agent preferences, imprecise 



probability theories need not assume the existence of any true prior (Walley, 1991 



Coletti and Scozzafava 2002). As often happens, the same mathematical framework 



is subject to very different philosophical interpretations. 



Remark 2. Technically, the principle of maximum entropy (Paris, 1994, Paris and 



Vencovska, 1997) mentioned in Section 2.1.1 could be used if G is finite or countable 



infinite. However, unlike the proposed methodology, that practice is equivalent to 
making the benchmark posterior P depend on the function f that maps a parameter 
space to O rather than on a method of data analysis that is coherent in the sense 
that its posterior depends on the data rather than on the hypothesis. If blending 
with such a method is not desired, one may average the Bayesian posteriors with 
respect to some measure that is not a function of G. For example, averaging with 



respect to the Lebesgue measure, as Bickel (2011a) did with confidence posteriors 



leads to (l + v^ {p{x))) /2 as the posterior probability of the null hypothesis under 
the assumptions of Section |3.1 Remark |5] discusses a more tenable version of the 
maximum entropy principle for blended inference. 
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Remark 3. Using definitions of divergence tliat include information divergence (pi) 



as a special case, Griinwald and Philip Dawid (2004) and Tops0e (2004) generalized 



variations of Proposition [TJ The theory of blended inference extends accordingly. 



Remark 4. A generalization of Section |2] in a different direction from that of Remark 
[Sl replaces each "mip^p/p\" of equation ([s]) with "infpgp." For that optimization 



problem, Theorem 2 of 



Tops0e 



(2007) has the condition that P eV =^ I {P\\Pj < 
oo in addition to the convexity of V that Proposition [ij of the present paper requires. 
Thus, in that formulation, the blended posterior P need not satisfy equation ^ even 
if V is convex. 



Remark 5. A posterior distribution P that is defined by 



IPWP 



inf / 

PeP 



(PHP) 



(12) 



satisfies the desiderata of Section 2.3 whether or not the conditions of Proposition 
[I] hold. While certain axiomatic systems (e.g., Csiszar, 1991) lead to this general- 
ization of the principle of maximum entropy (Remark |2]), the optimization problem 
of equation dsl) seems more compelling in this context and defines P even when no 



distribution satisfying equation (12) exists. 



Remark 6. In the presence of multiple comparisons, the confidence posteriors of Sec- 



tion 3.2.1 may be adjusted to control a family-wise error rate or false coverage rate 



(Benjamini and Yekutieli, 2005), if desired. Either error rate would then take the 



place of the conventional confidence level as the confidence posterior probability. 
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